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11. A ^ethod of generating a definition for a collection of 

2 source documents comprising: 

3 identifying patterns common to each source document in 

4 the collection ^f source documents; and 

5 constructing for an element type in the collection of 

6 source documents restrictive general rule based on the 

7 identified common patterns . 
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2. The methoq of claim 1, wherein identifying common 

patterns comprises: 

identifying\ cbifimon attribute names and types. 
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3. The method on claim 2, wherein identifying common 

patterns further comprises: 

identifying restricted attribute values associated with 
the common attribute namea and types. 



nril 4. The method of cla^m 2, wherein identifying common 

^1 attribute names and types comprises: 

3 determining the number of occurrences of each attribute 

4 name on an element type; 

5 examining the attribute values for each occurrence of 

6 each attribute name on the same \element type to determine the 

7 attribute type; and 

8 determining if the attribute name occurs in association 

9 with the same attribute value on moVe than one element type. 
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1 5. The method of claim 3, wherein identifying restricted 

2 attribute values\ comprises : 

3 examining attribute values for each occurrence of an 

4 attribute type in all of the source documents in the collection 

5 of source documents\ and 

6 establishing an enumeration or a restricted range 

7 appropriate to the attribute type. 

1 6. The method df claim 5, wherein identifying restricted 

2 attribute values further comprises: 

3 applying a heuristic to identify errors in the 
3 collection of source documents; and 

:B adjusting the \esVabl^Lshed enumeration or restricted 

P6 range for attribute values^ 

H 7. The method of claim 1, wherein constructing a 

M£ restricted general rule comprises: 

0 constructing a content model that specifies the 

]4- sequence order and number on occurrences of sub-elements within 

"ft the common pattern. \ 

1 8. The method of claim ta, wherein constructing a 

2 restricted general rule comprises: 

3 constructing attribute definitions and value rules for 

4 each identified common attribute^ name and type. 
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9. Tjie method of claim 1, further comprising: 

identifying those patterns found to achieve a 
predetermined threshold of commonness; and 

constructing a restrictive general rule for those 

\ 

identified patterns. 



10. A computer program residing on a computer-readable 
medium for buildincX a document type definition for a collection 
of source documents A the computer program comprising instructions 
causing a computer system to: / 

identify patVerks common to each source document in the 
collection of source documentsr; and 

construct for an er^nent type in the collection of 
source documents a restrict^ty-e general rule based on the 
identified common patterns\ 

11. The computer progrkm of claim 10 , wherein the 
instructions to identify common patterns comprise instructions 
to: 

identify common attribute names and types, 

12. The computer program of\claim 11, wherein the 
instructions to identify common patterns further comprise 
instructions to: 

identify restricted attribute values associated with 
the common attribute names and types, 
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13 . 



and 



A computer system comprising: 

a storage device for storing a set of source documents 



a computer 
definition buildin 





rocessor configured by a document type 
am to identify patterns common to each 
source document in khe set of source documents and construct for 
an element type in the set of source documents a restrictive 
general rule base on khe identified common patterns. 

14 \ A method of converting a format of a first source 

document to a format of a similarly structured second source 
document, the method comprising: 

identifying patterns common to the first and second 
source documents; and 

ing the identified common patterns to map elements 
and sub-elements in the first source document to equivalent 
elements and sub-elements in the second source document. 



15. The method of claim 14, further comprising: 

replacing, tag names for each of the elements and sub- 
elements in the fir&t source document with equivalent tag names 
of the elements and sub-elements in the second source document. 



16. The method of claim 14, wherein identifying patterns 

common to the first and second source documents comprises: 

examining document type definitions for the first and 
second source documents. 
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1 17. The method of claim 16, further comprising: 

2 producing the document type definition for the first 

3 source document if the document type definition for the first 

4 source document does not already exist. 

1 18. The meohod of claim 14, wherein identifying patterns 

2 common to the first and second source documents comprises: 

3 performing pattern matching. 

1 19. The method of claim 14, wherein identifying patterns 

2 common to the first and second source documents comprises: 

3 matching heuristics of the patterns in the first source 
sOJ document to heuristics o>f the patterns in the second source 

:;=S document . \ 

el 20. The method of claYm 18, wherein identifying patterns 

7"2 common to the first and second source documents further 

H3 comprises: \ 

H4 matching heuristics of the patterns in the first source 

s|j5 document to heuristics of the patterns in the second source 

"H5 document . \ 

7 21. The method of claim 14, wherein using uses the 

8 identified common patterns to map automatically elements and sub- 

9 elements in the first source document\to equivalent elements and 
10 sub-elements in the second source document. 

1 22 . A method of converting the format of a source document 
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2 to the format of a set of source documents, the set of source 

3 documents having a structure similar to the first source 

4 document, the method comprising: 

5 identifying patterns common to the source document and 

6 the set of source documents; 

7 mapping elements and sub-elements in the common pattern 

8 of the source dacument to equivalent elements and sub-elements 

9 the common pattern of the set of source documents; and 

10 replacing tag names for the each of the elements and 

11 sub-elements in common pattern of the source document with the 

12 equivalent tag names of the elements and sub-elements in common 

13 pattern of the set of source documents. 

jij 23. The method of claim 22, wherein identifying patterns 

•12 common to the source document and the set of source documents 

ilB comprises: \ 

™4 examining document type definitions for the source 

;H5 document and and the set of source documents. 

5 \ 
r. L \ 

sfi 24. The method of claim 23, further comprising: 

H> producing the document type definition for the source 

3 document if the document type definition for the source document 

4 does not already exist. \ 

1 25. A computer program residing on a computer-readable 

2 medium for converting a format of a\first source document to a 

3 format of a similarly structured secoVd source document, the 

4 computer program comprising instructions^ causing a computer 
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5 system to: \ 

6 identify patterns common to the first and second source 

7 documents; amd 

8 use\ the identified common patterns to map elements and 

9 sub-elements of the first source document to equivalent elements 

10 and sub-elementte of the second source document. 

11 26. The computer program of claim 25, further comprising 

12 instructions to: \ 

13 replace t^g names for the each of the elements and sub- 

14 elements in the common pattern of the first source document with 

15 equivalent tag names of the elements and sub-elements in the 
]|6 common pattern of the second source document . 

]t\ 27. The computer program of claim 26, wherein the 

iSE instructions to identify patterns common to the source document 
and the set of source documents comprise instructions to: 
' examine document Vype definitions for the source 

H5 document and and the set of source documents. 

H 28. A computer system comprising: 

2 a storage device for Nstoring a source document and a 

3 set of source documents, the source document having a format 

4 different from that of the set on source documents; and 

5 a computer processor configured by a mapping program to 

6 identify patterns common to the source document and the set of 

7 source documents and map elements and sub-elements in the common 

8 pattern of the source document to equivalent elements and sub- 
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9 elements the common pattern of the set of source documents. 
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